A story of women's representation in the movie industry

Posted by NoNansLand on December 21, 2022 ·

The last decade has been marked by emancipating and feminist movements. While many inequalities subsist to this day, women’s rights have progressed, along with mentalities[SOURCE ?]. As a result, the lack of women’s representation in far too many domains was put in the spotlight and can no longer be ignored. In this context, we consider that the film industry could be viewed as a reflection of society across time and space. Indeed, the cinema industry is a product of spectators’ expectations and social norms. However, it also impacts the viewer and can participate in popularizing new scenarios and characters, which can have a significant social impact in the long term.

In this context, we propose to study movies dating from 1940 to 2009 from the CMU corpus and attempt to derive trends and specificities of female characters to understand the evolution of women’s representation. This analysis might help us determine whether the recent movements were preceded by an underlying improvement of women’s condition as reflected in the film industry or whether these movements marked a societal gap. The evolution of women’s representation can be observed under three different scopes: the character’s traits such as age, their presence in the movie industry related to the number of female characters, and finally, the role given to female characters and their involvement in the plot.

Global representation of women in the movie industry

To set the stage, let’s look at the proportion of roles played by women in the whole set of movies.

general_pie

Obviously, something is wrong, this ratio must have evolved between decades, and recent ratios should tend to equity! So let’s look at it to settle the situation:

cnt_prop_gen

Even if we might observe an increasing trend, statistically, there is no significant difference in proportions of female characters between decades. We are far from reaching 50% of female characters on average. However, women’s representation can be studied from different scopes, one of them being age distribution. Our first intuition would be that female actresses tend to be younger than male actors, but numbers are better than words:

Once again a clear tendency appears, and confirms our intuition. We can also look at the age distribution per decade (using the interactive plot). Throughout the last century, women’s age distribution has been shifted towards a younger age compared to men. It tends to get better, but overall, actresses are younger on average than their male colleagues even though they have the same range of age represented. This highlights quite a significant attribute that sticks to female characters, they are younger and this gives already a hierarchical ascendancy to their masculine fellows. This also might be a symptom of the well-known female hypersexualization in visual content [LUOYING YANG]. Then, to get better insights on their importance in the movie, we can analyze what the movies’ summaries say about their role and their ability to make the plot go. To do so, we chose 3 metrics: the agent and patient verbs and the attributes associated with a character.

Respective examples could be: To explain those three 3 metrics, in the following sentences kill can take different values:

  • Mary has killed Dr. Jones (agent verb)
  • Mary was killed by Dr. Jones (patient verb)
  • Mary is a killer (attribute)

This graph shows the average percentage of allocation of words to women per movie, i.e., at 50%, each woman has as many words allocated to her as each man, independently of the number of female or male characters. We can observe that overall, each female character is almost as much described as each male character, with an allocation varying between 40 and 50%. That’s good news! Even though women are less present, when they are present, they are well-mentioned in the plot.

We have now determined that female characters are depicted with a similar amount of words to male characters. However, we have no clue how those words are used. From what we know, there could just be a high amount of words used to undermine women’s representation or their relevance in the plot. Therefore, let’s have a qualitative view of the words used to characterize women over time. These word clouds are generated on the entire dataset for the 1940’s and 2000’s movies.

wc3

We start by observing the agent verbs, as they can be more revealing of the character’s importance. We notice that on the surface, those graphs are very similar overall, with the same top words. Looking in detail, we notice that between the 1940’s and 2000’s, new words were introduced to describe women which used to be only attributed to men, such as ‘confront, ‘manage’, and ‘shoot’, while others, such as ‘marry’ disappeared, highlighting a slight but noticeable shift of female importance towards more active words over time.

wc2

Then, characters can also be depicted as secondary and could gather a high amount of words, which would be passive as the character would not make the plot go forward. Here, in opposition to agent verbs, we notice a difference in quantity between male and female characters. Indeed, we can see in 1940’s that female characters are extensively described with more diverse patient verbs, but the top words are the same. Whereas, terms like ‘love’, ‘marry’, and ‘jealous’ are more attributed to female characters, while ‘’shoot’ and ‘capture’ are missing, this trend inverses in the recent decades. Even though it is subtle, we notice an evolution in words used to describe women towards words that are less passive and that were typically used for men.

The diversity of the dataset could have covered these subtle differences. Indeed, Movie genre and movie origin, which are more extensive in the dataset, potentially hid some more specific results by smoothening the analysis.

Analysis by genre

As the movie industry is very diverse, women’s representation might be very different from one movie to another. For example, which female character first comes to your mind when you think about an action movie, and how does it compare to a typical romantic female character? In this context, we propose to study if our previous observations on the whole cinema industry are conserved across the different movie genres. As many movies belong to several genres at a time, we chose to analyze three movies genres that are as independent from each other as possible, i.e. Action, Horror and Romance. First, let’s look at the different female characters proportions in each genre: a

Even though not all genres display the same proportions of female characters, women are once again strongly under-represented, especially in action movies. Then, we can have a look at the other aspects of women’s representation to see if they also differ according to the genre, starting by the age distributions of the female and male actors.

a

Women are invariably younger than their colleagues. However, the situation is not the same among genres, with romance being the only genre with a real improvement, trying to bridge this gap in the past hundred years. Finally, it is interesting to watch how women’s description varies across genres.

Once again, separating by genres shows that looking at the general data can be deceiving. Looking at the word distribution by character, we can see that the women in action movies receive significantly fewer words than male characters. Even though there is an increase in women’s importance in the plots, it stays relatively small compared to romantic movies, in which it seems intuitive to give similar relative importance to women and men. Knowing that the number of words evolve in different manners according to the movie genre, we can once again assess the words associated with women:

wc

Interesting, isn’t it? As expected, the lexical fields differ from one movie genre to another. Here, we can see an overall similarity across time, but also between female and male characters. Even if female characters are underrepresented as shown above, we can see that in the 2000s action movies seem to be depicting similarly female and male characters. We notice larger words such as ‘kill’ and ‘tell’, as well as the apparition of words such as ‘leave’, ‘take’, ‘save’, ‘help’ and ‘meet’ compared to 1970s movies.

On the other hand, we can see that words in romance movies vary less through time and are consistent between genders. It is in line with our previous findings that romance movies show the most equal results. Indeed, we do not see the appearance of top words to the extent of action movies but rather the evolution and relevance of some of them such as ‘find’, ‘make’, ‘decide’ which increased and were already present in the 1940s and also in the male word cloud.

Geographical analysis

Until now, our analysis depicts a rather sad picture concerning the representation of women in the movie industry. Our intuition is that not every country presents the same gender inequalities. Therefore, we looked at movies produced in the United States (US) and in India, to see if women representation is homogeneous all around the world.
First, we can quantitatively assess the relative women’s representation. To do so, let’s look at the overall count of female characters in Indian movies by decade, compared to movies produced in the US. As one can see, there are more female characters in the US due to the fact that our dataset contained more movies produced there. The proportions of women however have only slightly changed, with a small increase trend for the last decades.

a

We now know that the proportion of women is similar in both countries, around 35%. Age distribution of female actresses in both countries remains to be seen. Note that since we are comparing countries that might have very different demographics, we also included the life expectancy throughout the last century for each country to allow a proper comparison of the average actresses’ age.

a

Whereas the average male and female actors’ ages do not change significantly throughout the last century in the US, where the life expectancy is always above 60 years old, the average Indian actors and actresses’ age increases along decades, following the rapidly increasing life expectancy from 1920 to 1960. As soon as the life expectancy catches up to 50 years old in the 1970s (reached in the 1920s for the US) the average age of actors and actresses reaches plateau. Even though the population is aging, the film industry seems consistently attached to having relatively young persons on the screen, and even younger females. It is worth mentioning that the difference in average age in the US (although clearly visible) is still smaller than in Indian movies.

Knowing that the relative presence of female and male actors might not be representative of their respective characters’ importance, we performed a last language analysis. As one can see, the percentage of words allocated to female characters in American movies slightly varies over the decades, but there is no significant changes over the whole centuries, female characters being described with only around 40% of the total descriptive words. However, there is a significant diminution of the percentages of words used to describe women in Indian movies, going against the trend of increasing female characters’ presence.

We have seen that women’s representation follows different trends according to genres. The preceding analysis shows that the origin of the movie also has an impact on the type of word used compared to the overall analysis. Once more, due to the major part of the movies in the dataset originating from the US, general results could have hidden underlying effects. Let’s have a look!

wc

It is hard to miss the words ‘daughter’ and ‘marry’ which were central in the Indian 1960s. In turn, the US resembles, as expected, the overall analysis. Indian female characters’ evolution is the most striking we’ve seen so far amongst all word clouds. Indeed, the words ‘daughter’ and ‘marry’ are significantly reduced to leave room for independence-related words such as ‘decide’, ‘begin’, ‘ask’ and ‘think’. We can see that the Indian word clouds greatly shifted towards words used to depict male characters in general. Finally, we also notice to a lesser extent an evolution in American movies, with more active words such as ‘decide’, ‘discover’ and ‘explain’. We can see that the Indian film industry has made undeniable progress and got closer to the Hollywood’s standards of considering women. Women are not yet considered equally to men in the plots.